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(57)Abstract: 

PURPOSE: To reduce the storage capacity required for 
arithmetic and to carry out more efficient and higher- 
speed arithmetic. 

CONSTITUTION: A control part 2 divides a matrix A 
for each row, forms arithmetic data for each row from 
the combination of element number data bi expressing 
the number of non-zero elements in one row, column 
numbers ci concerning all the non-zero elements and 
^jl vames di of elements and supplies the arithmetic data 

and data xi of a vector X to respective arithmetic 
processing parts la-Id, and the respective arithmetic 
processing parts la-Id store the arithmetic data in a SAM 
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V ~T 20 and store the data xi of the vector X in a RAM 22. 
59 : The respective arithmetic processing parts 1 a- 1 d read the 
element number data bi stored in the SAM 20, read the 
column numbers ci and the values di of elements based 
on the element number data bi and read the data xi of the 
vector X corresponding to the column numbers ci from 
the RAM 22, arithmetic is performed by an arithmetic 
part 1 1 5 and the arithmetic to calculate the inner product of the matrix A and the vector X is 
parallelly executed at the arithmetic parts la- Id. 
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CLAIMS 


[Claim(s)] 

[Claim 1] The storage section which memorizes operation data, and two or more data-processing 
sections respectively equipped with the operation part which calculates to the operation data 
memorized by this storage section, The number data of elements which divide data into a field 
and express the number of the non-zero elements which are not 0 in this field, Operation data are 
formed from the group of the location data showing the location in the above-mentioned field of 
the above-mentioned non-zero element, and the value data showing the value of the non-zero 
element of the location corresponding to a series of location data. Supply these operation data to 
each data-processing section, and it has the control section which controls the operation of each 
data-processing section. Each above-mentioned data-processing section the above-mentioned 
number data of elements memorized in each above-mentioned storage section Read-out, The 
parallel operation processor characterized by reading the above-mentioned location data and 
value data based on these number data of elements, calculating by operation part, and performing 
the operation to data to juxtaposition in two or more above-mentioned data-processing sections. 
[Claim 2] The above-mentioned control section is a parallel operation processor according to 
claim 1 which divides the data of the letter of a matrix for every line, uses the number of non- 
zero elements in this one line as the above-mentioned number data of elements, and is 
characterized by to form the above-mentioned operation data per line of the data of the letter of a 
matrix, and to supply each data-processing section by using the location within the one above- 
mentioned line of a non-zero element as the above-mentioned location data. 
[Claim 3] The above-mentioned storage section is a parallel operation processor according to 
claim 1 or 2 characterized by having the serial access memory which performs the continuous 
writing and continuous read-out of data. 

[Claim 4] The above-mentioned control section is a parallel operation processor given in any 1 
term of claim 1 characterized by supplying operation data to each data-processing section so that 
the load of each data-processing section may become equal thru/or claim 3. 


DETAILED DESCRIPTION 


[Detailed Description of the Invention] 
[0001] 

[Industrial Application] This invention relates to the parallel operation processor which 
calculates the data of the letter of a matrix especially about the parallel operation processor 
which calculates data to juxtaposition by two or more operation part. 
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[0002] 

[Description of the Prior Art] By the development and spread of computers in recent years, the 
large-scale scientific calculation currently considered for implementation to be before impossible 
is also realistic. For example, with the finite difference method and the finite element method 
which are used for electromagnetic-field analysis or the flow analysis of a fluid, and a boundary 
element method, the so-called matrix operation, such as very big characteristic value count of the 
solution of the simultaneous equation of size or a matrix, appears frequently in the course of 
those analyses. 

[0003] Moreover, in order to raise analysis precision, it is possible to enlarge a matrix, but if a 
matrix becomes large, even if it carries out matrix operation with the supercomputer in which a 
high-speed operation is possible, time amount borrows it very much, and high-speed processing 
is desired. 

[0004] Although storage with large storage capacity was called for with improvement in the 
speed of the processing speed of the data-processing section in the matrix operation of the above 
big sizes at high speed, since the storage capacity and the working speed of storage had an 
opposite relation, generally the storage with large storage capacity had the problem from which 
equipment will become big very at an expensive price at high speed. 
[0005] for this reason ~ for example, although it is high-speed between the data-processing 
section (CPU) 61 and the mass main storage 63 as shown in drawing 7 , it has the cache memory 
62 with a comparatively small capacity, data are beforehand read from main storage 63 to cache 
memory 62 through the bus line 65, when CPU61 reads data from cache memory 62 through a 
bus line 64, high-speed read-out of data is realized and high speed ************** - ls usec j j n 
the operation. 

[0006] Moreover, as shown in drawing 8 , in order to perform the same operation to a series of 
data, such as matrix operation, at a high speed, [ for example, ] A vector 77, i.e., the vector 
register which memorizes a series of data, two or more preparations, It calculates to a series of 
data memorized by this vector register 77 with the computing element of the adder 70 of the 
floating point (FP:FloatingPoint), a multiplier 71, and divider 72 grade. The processing unit (the 
so-called supercomputer) which accelerated the operation is known by pipelining these 
processings. 

[0007] In this processing unit, data are transmitted to the above-mentioned vector register 77 
through data lines 78 and 80 with vector I/O device (vector I/O) 79 in advance of an operation 
from main storage 81. And the data transmitted to the vector register 77 are again written in the 
vector register 77 through the output bus line 74, after each above-mentioned computing 
elements 70-72 are supplied through the input bus line 73 and an operation is completed. 
Furthermore, the data written in the vector register 77 which is the result of an operation are 
again memorized by main storage 81 through vector 1/079. 
[0008] 

[Problem(s) to be Solved by the Invention] However, if data other than the data which read data 
from main storage 63 to cache memory 62 beforehand are needed, it is necessary to read the 
processing unit which used the above-mentioned cache memory 62 from main storage 63 (if a 
mistake hit is carried out). Generally, by the operation of the big matrix of size, the capacity of 
cache memory 62 was small, the data which have not gone into cache memory 62 needed to be 
read from main storage 63 each time, as for the inside of the time amount which is this read-out, 
CPU61 stopped and there was a problem to which the operation speed as the whole falls. 
[0009] Moreover, also in the processing unit equipped with the above-mentioned vector register 
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77, since it was necessary to read from main storage 81 by the operation of the big matrix of size 
about the element (data) of the matrix which is not memorizable to the vector register 77 since 
the capacity of the vector register 77 is small, and it was generally necessary to calculate, 
operation speed was reduced. 

[0010] Then, paying attention to the descriptions, such as read-out sequence of the data in the 
case of matrix operation, as shown in drawing 9 and drawing 10 , this applicant had the serial 
access memory which performs the continuous writing and continuous read-out of data in each 
data-processing section of a parallel operation processor which calculates to juxtaposition in two 
or more data-processing sections, accelerated operation speed, and has proposed previously the 
parallel operation processor which realized low-cost-izing and a miniaturization as Japanese 
Patent Application No. No. 149597 [ five to ]. 

[001 1] The m data-processing sections 101a and 1010b which calculate as this parallel operation 
processor is shown in drawing 9 , and 101 c****** 101m, The control section 102 which outputs 
a control signal, operation data, etc. which control a these data-processing sections [ lOla-lOlm 
] operation, It has the data bus 103 which supplies the operation data from this control section 
102 etc. to the above-mentioned data-processing sections 101 a- 101m, and the control bus 104 
which supplies the control signal from the above-mentioned control section 102 to the above- 
mentioned data-processing sections lOla-lOlm. 

[0012] Moreover, each above-mentioned data-processing sections lOla-lOlm The storage 
section 110 which memorizes the operation data supplied from the above-mentioned control 
section 102 as shown in drawing 10 , The operation part 1 1 1 which calculates to the data 
memorized by this storage section 110, The buffer 1 13 which performs the communication link 
with the above-mentioned data bus 103 through an internal data bus 112, The buffer 1 15 which 
performs the communication link with the above-mentioned control bus 104 through the internal 
control bus 1 14, and a control signal required for a communication link are generated, and it has 
the control signal generating section 1 16 supplied to a buffer 1 1 5 and operation part 111. 
[0013] Furthermore, the above-mentioned storage section 110 consists of mass continuation I/O 
memory (henceforth mass serial access memory (SAM)) 120 which performs the continuous 
writing and continuous read-out of data, and high-speed general-purpose RAM 122 of capacity 
which performs the random writing and random read-out of the small capacity SAM 121 with a 
small capacity, and data and which is not so large, as shown in above-mentioned drawing 10 . 
[0014] In this parallel operation processor, when asking for the product of the matrix A shown, 
for example in the following formula 1, and Vector X, a control section 102 divides Matrix A, 
and assigns and supplies the data for every line to each data-processing section. For example, in 
the case of a formula 1, the data for two lines of 8x8 matrices are assigned to the four data- 
processing sections lOla-lOld, and each data-processing sections lOla-lOld are supplied. For 
example, data of the 1st line and the 5th line are supplied to data-processing section 101a, data of 
the 2nd line and the 6th line are supplied to data-processing section 101b, data of the 3rd line and 
the 7th line are supplied to data-processing section 101c, and data of the 4th line and the 8th line 
are supplied to 101 d of data-processing sections. 
[0015] 
[Equation 1] 
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[0016] And a control section 102 is the element xi of Vector X. By supplying each data- 
processing sections lOla-lOld, each data-processing sections 1 01 a- 101 d store each line of 
Matrix A in large capacity SAM 120, as shown in drawing 1 1 , and it is the element xi of Vector 
X. It stores in RAM 122. And it is the element xi of the vector X continuously corresponding to [ 
to the time of an operation / in the each data-processing sections / 1 01 a- 1 Old / operation part 1 1 1 
] read-out and Element aij for the element aij of each line. Element yi of the vector Y as a result 
of performing the operation shown in read-out and the following formula 2 from RAM 122 and 
being based on this operation It memorizes to RAMI 22. And a control section 102 is the element 
yi of Vector Y called for in each data-processing sections lOla-lOld. Vector Y was searched for 
unitedly. Consequently, this parallel operation processor can perform matrix operation now at a 
high speed. 
[0017] 
[Equation 2] 


[0018] On the other hand, the large-scale matrix which appears in scientific calculation was a 
sparse matrix most whose elements are 0 in many cases, and if all the elements of this sparse 
matrix were memorized at the time of an operation, there was a problem which needs huge 
storage capacity, so that it might be represented by a finite difference method, the finite element 
method, linear programming, electric circuit analysis, etc. 

[0019] Then, the processing unit which shortens the operation time was considered by not 
performing the operation about zero element whose value of the element in this sparse matrix is 
0, or not memorizing zero element. 

[0020] however, in the processing unit which does not perform the operation about zero element 
of the above-mentioned sparse matrix the need of the positional information of the data which 
are not 0 in the sparse matrix used as the candidate for an operation being needed on the occasion 
of an operation, memorizing all elements, and judging at the time of an operation ~ or It is 
difficult to memorize the positional information of the data which are not 0 in a sparse matrix 
beforehand, and to apply to the parallel operation processor which the whole operation time was 
increased as a result, and used the above-mentioned serial access memory by the read time of 
such information. 

[0021] Moreover, in the processing unit which does not memorize zero element of the above- 
mentioned sparse matrix, there was a problem on which it is necessary to memorize the 
positional information of operation data in addition to operation data, the storage capacity which 
an operation takes increases, and the operation time as the whole increases by the read time of 
positional information. 

[0022] This invention can be made in view of the above troubles, can calculate the matrix of big 
size at a high speed, can reduce the storage capacity which an operation takes, can raise cost 


4 


Machine English translation of JP 07-239843 


performance, and aims at offer of the parallel operation processor which made the 

miniaturization possible. 

[0023] 

[Means for Solving the Problem] In order to solve an above-mentioned technical problem, the 
parallel operation processor concerning this invention The storage section which memorizes 
operation data, and two or more data-processing sections respectively equipped with the 
operation part which calculates to the operation data memorized by the storage section, The 
number data of elements which divide data into a field and express the number of the non-zero 
elements which are not 0 in a field, Operation data are formed from the group of the location 
data showing the location in the field of a non-zero element, and the value data showing the 
value of the non-zero element of the location corresponding to a series of location data. Supply 
operation data to each data-processing section, and it has the control section which controls the 
operation of each data-processing section. It is characterized by for each data-processing section 
reading location data and value data based on read-out and the number data of elements, 
calculating the number data of elements memorized in each storage section by operation part, 
and performing the operation to data to juxtaposition in two or more data-processing sections. 
[0024] Moreover, it is characterized by for a control section dividing the data of the letter of a 
matrix for every line, and for the parallel operation processor concerning this invention using the 
number of non-zero elements in one line as the number data of elements, forming operation data 
per line of the data of the letter of a matrix by using the location within one line of a non-zero 
element as location data, and supplying them to each data-processing section. 
[0025] Moreover, the parallel operation processor concerning this invention is characterized by 
equipping the storage section with the serial access memory which performs the continuous 
writing and continuous read-out of data. 

[0026] Moreover, the parallel operation processor concerning this invention is characterized by a 
control section supplying operation data to each data-processing section so that the load of each 
data-processing section may become equal. 
[0027] 

[Function] A control section divides the data of the letter of a matrix into a field, forms operation 
data from the group of the number data of elements showing the number of the non-zero 
elements which are not 0 in a field, the location data showing the location of a non-zero element, 
and the value data showing the value of the non-zero element of the location corresponding to a 
series of location data, supplies operation data to each data-processing section, and controls the 
operation of each data-processing section by the parallel-operation processor concerning this 
invention. Each data-processing section supplies the operation data supplied from the control 
section to the storage section, and operation data are memorized by the storage section. And each 
data-processing section reads location data and value data based on read-out and the number data 
of elements, and calculates to juxtaposition the number data of elements memorized by the 
storage section. 

[0028] In the parallel operation processor concerning this invention, moreover, a control section 
Divide the data of the letter of a matrix for every line, and the number of non-zero elements in 
one line is used as the above-mentioned number data of elements. The location within the one 
above-mentioned line of a non-zero element is used as the above-mentioned location data, the 
above-mentioned operation data are formed per line of the data of the letter of a matrix from the 
group of the value data showing the value of the non-zero element of the location corresponding 
to a series of location data, operation data are supplied to the data-processing section, and the 
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operation of each data-processing section is controlled. Each data-processing section supplies the 
operation data supplied from the control section to the storage section, and operation data are 
memorized by the storage section. And each data-processing section reads location data and 
value data based on read-out and the number data of elements, and calculates to juxtaposition the 
number data of elements memorized in the storage section. 

[0029] Moreover, a control section divides data into a field, forms operation data from the group 
of the number data of elements showing the number of the non-zero elements which are not 0 in 
a field, the location data showing the location of a non-zero element, and the value data showing 
the value of the non-zero element of the location corresponding to a series of location data, 
supplies operation data to each data-processing section, and controls the operation of each data- 
processing section by the parallel-operation processor concerning this invention. Each data- 
processing section supplies continuously the operation data supplied from the control section to 
serial access memory, and operation data are memorized by serial access memory. And each 
data-processing section reads the number data of elements memorized to serial access memory, 
reads location data and value data continuously based on the number data of elements, and 
calculates to juxtaposition. 

[0030] In the parallel operation processor concerning this invention, moreover, a control section 
The number data of elements which divide data into a field and express the number of the non- 
zero elements which are not 0 in a field, Operation data are formed from the group of the 
location data showing the location of a non-zero element, and the value data showing the value 
of the non-zero element of the location corresponding to a series of location data, operation data 
are supplied to each data-processing section so that the load of each data-processing section may 
become equal, and the operation of each data-processing section is controlled. Each data- 
processing section supplies the operation data supplied from the control section to the storage 
section, and operation data are memorized by the storage section. And from the storage section, 
each data-processing section reads location data and value data based on read-out and the 
number data of elements, calculates by operation part, and calculates the number data of 
elements to juxtaposition. 
[0031] 

[Example] Hereafter, the suitable example of the parallel operation processor concerning this 
invention is explained to a detail, referring to a drawing. 

[0032] This parallel operation processor For example, the m data-processing sections la and lb 
which calculate as shown in drawing 1 and l c ******lm, The control section 2 which outputs a 
control signal, operation data, etc. which control these operation control sections la-lm, It has 
the data bus 3 which supplies the operation data from this control section 2 to the above- 
mentioned operation control sections la-lm, and the control bus 4 which supplies the control 
signal from the above-mentioned control section 2 to the above-mentioned operation control 
sections la-lm. 

[0033] Each above-mentioned operation control sections la-lm For example, the storage section 
10 which memorizes the operation data supplied from the above-mentioned control section 2 as 
shown in drawing 2 , The operation part 1 1 which calculates to the data memorized by this 
storage section 10, The buffer 13 which performs the communication link with the above- 
mentioned data bus 3 through an internal data bus 12, the buffer 15 which performs the 
communication link with the above-mentioned control bus 4 through the internal control bus 14, 
and a control signal required for a communication link are generated, and it has the control signal 
generating section 16 supplied to a buffer 15 and operation part 11. Each data-processing 
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sections la-lm calculate according to the control signal from a control section 2. 
[0034] And the above-mentioned storage section consists of mass continuation I/O memory 
(henceforth mass serial access memory (SAM)) 20 which performs the continuous writing and 
continuous read-out of data, and high-speed general-purpose RAM22 of capacity which performs 
the random writing and random read-out of the small capacity SAM 21 with a small capacity, 
and data and which is not so large, as shown in above-mentioned drawing 2 . 
[0035] Moreover, above-mentioned size ** SAM 20 and the small capacity SAM 21 For 
example, the memory cell array 40 to which the writing and read-out of data are performed as 
shown in drawing 3 , The read-out line address counter 41 which generates the read-out address 
of the data to the line of this memory cell array 40, The write-in line address counter 42 which 
generates the write-in address of the data to the line of the above-mentioned memory cell array 
40, The read-out train address counter 43 which generates the read-out address of the data to the 
train of the above-mentioned memory cell array 40, It has the write-in train address counter 44 
which generates the write-in address of the data to the train of the above-mentioned memory cell 
array 40, data input Rhine 45 which inputs input data Din, and data output Rhine 46 which 
outputs output-data Dout. 

[0036] And at the time of the writing of data, the reset light signal RSTW is supplied to the 
above-mentioned write-in line address counter 42 and the above-mentioned write-in address train 
counter 44 with the write enable signal WE, a counter value is reset, the write-in clock WCK 
supplied henceforth is counted, and the address which writes in is generated. Furthermore, with 
the above-mentioned write-in clock WCK, the write-in data Din are continuously supplied from 
data input Rhine 45, and are memorized by the memory cell 40. 

[0037] Moreover, at the time of read-out of data, the reset lead signal RSTR is supplied to the 
above-mentioned read-out line address counter 41 and the above-mentioned read-out address 
train counter 43 with the lead enable signal RE, a counter value is reset, the read-out clock RCK 
supplied henceforth is counted, and the address which performs read-out is generated. 
Furthermore, with the above-mentioned read-out clock RCK, the read-out data Din are 
continuously read from a memory cell 40, and it is outputted to data output Rhine 46. 
[0038] Next the operation which asks this parallel operation processor actuation for the product 
of a matrix and a vector is made into an example, and it explains. Generally, the product Y of the 
matrix A as shown in the following formulas 3, and Vector X is the element yi of the product 
vector Y by the operation as shown in the following formula 4. It asks by asking. 
[0039] 
[Equation 3] 


[0040] 
[Equation 4] 


[0041] Here, as Matrix A shows the operation of the above-mentioned formula 3 in the following 
formulas 5, when the element of the great portion of matrix is the so-called sparse matrix which 
is 0, it can ask for the product Y of Matrix A and Vector X by the operation as shown in the 
following formula 6.1 - a formula 6.8. 
[0042] 
[Equation 5] 
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[0043] 
[Equation 6] 

[0044] So, with this parallel operation equipment, a control section 2 divides Matrix A into each 
line, forms operation data from the group of the number data of elements showing the number of 
the non-zero elements in one line, the location data (row number) about all non-zero elements, 
and value data (value of an element), and supplies this operation data to each data-processing 
sections la-lm. In the case of the above-mentioned formula 5, since the non-example elements 
of the 1st line are al 1 and al2, the number data of elements are set to "2", and, specifically, 
location data are set to "1" and "2", respectively, for example. Moreover, since the non-example 
elements of the 2nd line are a21, a22, and a24, the number data of elements are set to "3", 
location data are set to "1", "2", and "4", respectively, and value data are set to "a21", "a22 M , and 
M a24", respectively, for example. 

[0045] Moreover, a control section 2 compares the number of the non-zero elements of each line, 
and it assigns each data-processing sections [la-lm ] operation data so that an each data- 
processing sections [la-lm ] load may become equal. Allocation of concrete operation data 
memorizes the number of the non-zero elements of each line of for example, the matrix A, and is 
performed by choosing the line assigned to each data-processing sections la-lm so that the 
number of these non-zero elements may become equal. For example, in the case of the above- 
mentioned formula 5, since the number of the non-zero elements from the 1st line of Matrix A to 
the 8th line is 2, 3, 1, 2, 2, 1, 3, and 2, respectively, as shown, for example in drawing 4 In the 
case where it has the four data-processing sections la, lb, lc, and Id Data of the 1st line and the 
4th line are supplied to data-processing section la, data of the 2nd line and the 3rd line are 
supplied to data-processing section lb, data of the 5th line and the 8th line are supplied to data- 
processing section lc, and data of the 6th line and the 7th line are supplied to Id of data- 
processing sections. 

[0046] Consequently, the number of the non-zero elements which the operation data for two 
lines are respectively supplied to each data-processing sections la- Id, and are supplied to each 
data-processing sections la-Id is four pieces respectively, and the each data-processing sections 
[ la- Id ] load (operation) is equal. 

[0047] And the each processing units [ la- Id ] operation part 1 1 calculates by reading the above- 
mentioned operation data (the number data of elements, location data, value data) memorized to 
large capacity SAM 20 according to the control signal from a control section 2. 
[0048] the next ~ each data-processing section 1 -- the case where the product of the matrix A 
which shows actuation of the actual operation data in a- Id in the above-mentioned formula 3, 
and Vector X is calculated is explained to an example using the flow chart shown in 5 Fig. First, 
as operation data are supplied to each data-processing sections la- Id in advance of the operation, 
for example, it is shown in drawing 4 , the operation data for every line of Matrix A are 
memorized by large capacity SAM 20, and the data of Vector X are memorized by high-speed 
general-purpose RAM22. Specifically to the large capacity SAM 20 of data-processing section 
la The number data bl of elements showing the number of the non-zero elements of the 1st line 
of Matrix A (2), The number data b2 of elements which 2 sets of location data cp (1 p= 2) and 
the value data dp (1 1 1 a 2a 12) are memorized continuously, then express the number of the 
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non-zero elements of the 4th line of Matrix A (2), 2 sets of location data cp (2 p= 4) and the 
value data dp (2 42 a 4a 44) are memorized. Furthermore, the data xq (q= 1, 2... 8) of Vector X 
are memorized by high-speed general-purpose RAM22. Moreover, although not illustrated, the 
data in which the line count (the line count in their duty) and line number (i) which were 
assigned to each data-processing sections la- Id with operation data are shown are supplied to 
each data-processing sections la-Id. 

[0049] And the operation data memorized by large capacity SAM 20 as mentioned above are 
continuously read in the sequence memorized at the time of an operation. For example, at above- 
mentioned data-processing section la, it is the number data bl of elements of the 1st line of 
Matrix A first. (2) is read, Then, 2 sets of location data cp The value data dp (1 1 1 a 2a 12) are 
read continuously, and it is the number data d2 of elements of the 4th line of Matrix A further. 
(2) and 2 sets of location data cp The value data dp (2 42 a 4a 44) are read. 
[0050] And a control section 2 supplies the control signal which starts an operation to each data- 
processing sections la- Id through a control bus 4 and the internal control bus 14, and progresses 
to step SI. And in step SI, operation part 1 1 sets to 1 the count variable k which counts a 
processing line count, and progresses to step S2. 

[0051] In step S2, operation part 1 1 compares the value and the line count in its duty of the count 
variable k, if the count variable k is below the line count in its duty, it will progress to step S3, 
and if the count variable k is size from the line count in its duty, since processing for the line 
count in its duty is already completed, it will be ended. 

[0052] In step S3, operation part 1 1 reads from large capacity SAM 20, the several m b, i.e., 
above-mentioned number data of elements, of a non-zero element, and progresses to step S4. 
[0053] In step S4, the value of the count variable L is set to 1, it sets the value of Variable yi to 0, 
and operation part 1 1 progresses to step S5. 

[0054] In step S5, operation part 1 1 measures several m of the value of the count variable L, and 
a non-zero element, if the value of the count variable L is several m or less of a non-zero 
element, it will progress to step S6, and if the value of the count variable L is size from several m 
of a non-zero element, since processing for one line is already completed, it will progress to step 
11. 

[0055] It sets to step S6 and operation part 1 1 is the row number cp of a non-zero element (j), 
i.e., above-mentioned location data, from large capacity SAM 20. It reads and progresses to step 
S7. 

[0056] Setting to step S7, operation part 1 1 is, the value (aij) dp, i.e., the above-mentioned value 
data, of an element of the matrix from large capacity SAM 20. It reads and progresses to step S8. 
[0057] It is the value xj of the element of the vector X on step S8 and corresponding to the row 
number (j) of high-speed general -purpose RAM22 to the above-mentioned non-zero element in 
operation part 1 1 . It reads and progresses to step S9. 

[0058] Setting to step S9, operation part 1 1 is Variable yi. The value (aij) of the element of a 
matrix, and value xj of the element of Vector X A product is added and it progresses to step 10. 
[0059] In step S10, operation part 1 1 adds 1 to the value of the count variable L, and returns to 
step S5. That is, processing from the above-mentioned step S5 to step S10 is repeated about the 
data for one line, and they are the value (aij) of the element for one line, and the value xj of the 
element of Vector X. It is Variable yi about a product. It will add. 

[0060] After processing for one line is completed, the value of the count variable L serves as size 
from several m of a non-zero element, and it progresses to step SI 1 from step S5, and sets to step 
S 1 1 . And operation part 1 1 Variable yi called for as mentioned above Value yi, i.e., the value of 
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the element of the product vector Y, In step S12 which writes in and follows high-speed general- 
purpose RAM22, 1 is added to the value of the count variable k, and it returns to step S2. That is, 
processing from step S2 to step S12 is repeated until processing for the line count in its duty is 
completed. 

[0061] As mentioned above, each data-processing sections la- Id are the values yi of the 
assigned element of the product vector Y according to the above-mentioned formula 6. 1 - a 
formula 6.8. As it calculates, for example, is shown in drawing 6 , it memorizes to high-speed 
general -purpose RAM22. And value yi of the element of this product vector Y It is read by the 
control signal from a control section 2, and a control section 2 is the value yi of the element of 
the product vector Y from each data-processing sections la- Id. An operation is ended unitedly. 
[0062] Since this invention was applied to the parallel operation processor which performs the 
operation which asks for the product of a matrix and a vector, while a value cannot perform the 
operation to the zero element which is 0 but can accelerate an operation, in this parallel operation 
processor, the storage capacity of the storage section which an operation takes can be reduced, so 
that clearly from the above explanation. Moreover, since operation data were assigned to each 
data-processing section so that the load of each data-processing section might become equal as 
mentioned above, operation effectiveness can be raised and the whole operation time can be 
shortened. 

[0063] In addition, the technical thought of this invention is not limited to an above-mentioned 
example, and can also apply the operation to process also to the operation which asks for an 
above-mentioned matrix, not only the product of a vector but a matrix, and the product of 
matrices. Moreover, the field which divides data can also perform the operation which asks for a 
matrix and the product of matrices at a high speed by dividing one matrix for every line, dividing 
the matrix of another side for every train, and forming operation data by the operation which 
asks not only for the line unit of an above-mentioned matrix but for an above-mentioned matrix 
and the above-mentioned product of matrices. Moreover, it is clear that it is applicable also to the 
operation to the data with which the data which serve as a candidate for an operation, for 
example are not restricted to an above-mentioned matrix, either, and a single string continues. 
[0064] 

[Effect of the Invention] In the parallel operation processor applied to this invention by above- 
mentioned explanation so that clearly The number data of elements which a control section 
divides the data of the letter of a matrix into a field, and express the number of the non-zero 
elements which are not 0 in a field, Operation data are formed from the group of the location 
data showing the location of a non-zero element, and the value data showing the value of the 
non-zero element of the location corresponding to a series of location data. Operation data are 
supplied to each data-processing section, and the operation of each data-processing section is 
controlled. Each data-processing section The number data of elements with which each operation 
part was memorized by each storage section by reading location data and value data based on 
read-out and the number data of elements, and calculating to juxtaposition by supplying the 
operation data supplied from the control section to the storage section It can calculate at a high 
speed and the storage capacity which an operation takes can be reduced. 
[0065] Moreover, in the parallel operation processor concerning this invention, by dividing for 
every line of the data of the letter of a matrix, forming operation data, and calculating to 
juxtaposition in two or more data-processing sections, a matrix can be calculated at a high speed 
and the storage capacity which an operation takes can be reduced. 

[0066] Moreover, in the parallel operation processor concerning this invention, since the storage 
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section was equipped with the serial access memory which performs the continuous writing and 
continuous read-out of data, cost performance can be raised and equipment can be miniaturized. 
[0067] Moreover, in the parallel operation processor concerning this invention, a more efficient 
and high-speed operation can be performed by supplying operation data to each data-processing 
section so that the load of each data-processing section may become equal. 


TECHNICAL FIELD 


[Industrial Application] This invention relates to the parallel operation processor which 
calculates the data of the letter of a matrix especially about the parallel operation processor 
which calculates data to juxtaposition by two or more operation part. 


PRIOR ART 


[Description of the Prior Art] By the development and spread of computers in recent years, the 
large-scale scientific calculation currently considered for implementation to be before impossible 
is also realistic. For example, with the finite difference method and the finite element method 
which are used for electromagnetic-field analysis or the flow analysis of a fluid, and a boundary 
element method, the so-called matrix operation, such as very big characteristic value count of the 
solution of the simultaneous equation of size or a matrix, appears frequently in the course of 
those analyses. 

[0003] Moreover, in order to raise analysis precision, it is possible to enlarge a matrix, but if a 
matrix becomes large, even if it carries out matrix operation with the supercomputer in which a 
high-speed operation is possible, time amount borrows it very much, and high-speed processing 
is desired. 

[0004] Although storage with large storage capacity was called for with improvement in the 
speed of the processing speed of the data-processing section in the matrix operation of the above 
big sizes at high speed, since the storage capacity and the working speed of storage had an 
opposite relation, generally the storage with large storage capacity had the problem from which 
equipment will become big very at an expensive price at high speed. 
[0005] for this reason - for example, although it is high-speed between the data-processing 
section (CPU) 61 and the mass main storage 63 as shown in drawing 7 , it has the cache memory 
62 with a comparatively small capacity, data are beforehand read from main storage 63 to cache 
memory 62 through the bus line 65, when CPU61 reads data from cache memory 62 through a 
bus line 64, high-speed read-out of data is realized and high speed ************** [ s use( j m 
the operation. 

[0006] Moreover, as shown in drawing 8 , in order to perform the same operation to a series of 
data, such as matrix operation, at a high speed, [ for example, ] A vector 77, i.e., the vector 
register which memorizes a series of data, two or more preparations, It calculates to a series of 
data memorized by this vector register 77 with the computing element of the adder 70 of the 
floating point (FP:FloatingPoint), a multiplier 71, and divider 72 grade. The processing unit (the 
so-called supercomputer) which accelerated the operation is known by pipelining these 
processings. 
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[0007] In this processing unit, data are transmitted to the above-mentioned vector register 77 
through data lines 78 and 80 with vector I/O device (vector I/O) 79 in advance of an operation 
from main storage 81. And the data transmitted to the vector register 77 are again written in the 
vector register 77 through the output bus line 74, after each above-mentioned computing 
elements 70-72 are supplied through the input bus line 73 and an operation is completed. 
Furthermore, the data written in the vector register 77 which is the result of an operation are 
again memorized by main storage 81 through vector 1/079. 


EFFECT OF THE INVENTION 


[Effect of the Invention] With the parallel operation processor applied to this invention by above- 
mentioned explanation so that clearly The number data of elements which a control section 
divides the data of the letter of a matrix into a field, and express the number of the non-zero 
elements which are not 0 in a field, Operation data are formed from the group of the location 
data showing the location of a non-zero element, and the value data showing the value of the 
non-zero element of the location corresponding to a series of location data. Operation data are 
supplied to each data-processing section, and the operation of each data-processing section is 
controlled. Each data-processing section The number data of elements with which each operation 
part was memorized by each storage section by reading location data and value data based on 
read-out and the number data of elements, and calculating to juxtaposition by supplying the 
operation data supplied from the control section to the storage section It can calculate at a high 
speed and the storage capacity which an operation takes can be reduced. 
[0065] Moreover, in the parallel operation processor concerning this invention, by dividing for 
every line of the data of the letter of a matrix, forming operation data, and calculating to 
juxtaposition in two or more data-processing sections, a matrix can be calculated at a high speed 
and the storage capacity which an operation takes can be reduced. 

[0066] Moreover, in the parallel operation processor concerning this invention, since the storage 
section was equipped with the serial access memory which performs the continuous writing and 
continuous read-out of data, cost performance can be raised and equipment can be miniaturized. 
[0067] Moreover, in the parallel operation processor concerning this invention, a more efficient 
and high-speed operation can be performed by supplying operation data to each data-processing 
section so that the load of each data-processing section may become equal. 


TECHNICAL PROBLEM 


[Problem(s) to be Solved by the Invention] However, if data other than the data which read data 
from main storage 63 to cache memory 62 beforehand are needed, it is necessary to read the 
processing unit which used the above-mentioned cache memory 62 from main storage 63 (if a 
mistake hit is carried out). Generally, by the operation of the big matrix of size, the capacity of 
cache memory 62 was small, the data which have not gone into cache memory 62 needed to be 
read from main storage 63 each time, as for the inside of the time amount which is this read-out, 
CPU61 stopped and there was a problem to which the operation speed as the whole falls. 
[0009] Moreover, also in the processing unit equipped with the above-mentioned vector register 
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77, since it was necessary to read from main storage 81 by the operation of the big matrix of size 
about the element (data) of the matrix which is not memorizable to the vector register 77 since 
the capacity of the vector register 77 is small, and it was generally necessary to calculate, 
operation speed was reduced. 

[0010] Then, paying attention to the descriptions, such as read-out sequence of the data in the 
case of matrix operation, as shown in drawing 9 and drawing 10 , this applicant had the serial 
access memory which performs the continuous writing and continuous read-out of data in each 
data-processing section of a parallel operation processor which calculates to juxtaposition in two 
or more data-processing sections, accelerated operation speed, and has proposed previously the 
parallel operation processor which realized low-cost-izing and a miniaturization as Japanese 
Patent Application No. No. 149597 [ five to ]. 

[001 1] The m data-processing sections 101a and 1010b which calculate as this parallel operation 
processor is shown in drawing 9 , and 101 c******101m, The control section 102 which outputs 
a control signal, operation data, etc. which control a these data-processing sections [ lOla-lOlm 
] operation, It has the data bus 103 which supplies the operation data from this control section 
102 etc. to the above-mentioned data-processing sections lOla-lOlm, and the control bus 104 
which supplies the control signal from the above-mentioned control section 102 to the above- 
mentioned data-processing sections 101a- 101m. 

[0012] Moreover, each above-mentioned data-processing sections lOla-lOlm The storage 
section 110 which memorizes the operation data supplied from the above-mentioned control 
section 102 as shown in drawing 10 , The operation part 1 1 1 which calculates to the data 
memorized by this storage section 110, The buffer 113 which performs the communication link 
with the above-mentioned data bus 103 through an internal data bus 112, The buffer 115 which 
performs the communication link with the above-mentioned control bus 104 through the internal 
control bus 114, and a control signal required for a communication link are generated, and it has 
the control signal generating section 116 supplied to a buffer 115 and operation part 111. 
[0013] Furthermore, the above-mentioned storage section 110 consists of mass continuation I/O 
memory (henceforth mass serial access memory (SAM)) 120 which performs the continuous 
writing and continuous read-out of data, and high-speed general-purpose RAM 122 of capacity 
which performs the random writing and random read-out of the small capacity SAM 121 with a 
small capacity, and data and which is not so large, as shown in above-mentioned drawing 10 . 
[0014] In this parallel operation processor, when asking for the product of the matrix A shown, 
for example in the following formula 1, and Vector X, a control section 102 divides Matrix A, 
and assigns and supplies the data for every line to each data-processing section. For example, in 
the case of a formula 1, the data for two lines of 8x8 matrices are assigned to the four data- 
processing sections lOla-lOld, and each data-processing sections 10 la- 10 Id are supplied. For 
example, data of the 1st line and the 5th line are supplied to data-processing section 101a, data of 
the 2nd line and the 6th line are supplied to data-processing section 101b, data of the 3rd line and 
the 7th line are supplied to data-processing section 101c, and data of the 4th line and the 8th line 
are supplied to 1 Old of data-processing sections. 
[0015] 
[Equation 1] 

[0016] And a control section 102 is the element xi of Vector X. By supplying each data- 
processing sections lOla-lOld, each data-processing sections 1 01 a- 1 Old store each line of 
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Matrix A in large capacity SAM 120, as shown in drawing 1 1 , and it is the element xi of Vector 
X. It stores in RAM 122. And it is the element xi of the vector X continuously corresponding to [ 
to the time of an operation / in the each data-processing sections / lOla-lOld / operation part 1 1 1 
] read-out and Element aij for the element aij of each line. Element yi of the vector Y as a result 
of performing the operation shown in read-out and the following formula 2 from RAM 122 and 
being based on this operation It memorizes to RAMI 22. And a control section 102 is the element 
yi of Vector Y called for in each data-processing sections lOla-lOld. Vector Y was searched for 
unitedly. Consequently, this parallel operation processor can perform matrix operation now at a 
high speed. 
[0017] 
[Equation 2] 

[0018] On the other hand, the large-scale matrix which appears in scientific calculation was a 
sparse matrix most whose elements are 0 in many cases, and if all the elements of this sparse 
matrix were memorized at the time of an operation, there was a problem which needs huge 
storage capacity, so that it might be represented by a finite difference method, the finite element 
method, linear programming, electric circuit analysis, etc. 

[0019] Then, the processing unit which shortens the operation time was considered by not 
performing the operation about zero element whose value of the element in this sparse matrix is 
0, or not memorizing zero element. 

[0020] however, in the processing unit which does not perform the operation about zero element 
of the above-mentioned sparse matrix the need of the positional information of the data which 
are not 0 in the sparse matrix used as the candidate for an operation being needed on the occasion 
of an operation, memorizing all elements, and judging at the time of an operation ~ or It is 
difficult to memorize the positional information of the data which are not 0 in a sparse matrix 
beforehand, and to apply to the parallel operation processor which the whole operation time was 
increased as a result, and used the above-mentioned serial access memory by the read time of 
such information. 

[0021] Moreover, in the processing unit which does not memorize zero element of the above- 
mentioned sparse matrix, there was a problem on which it is necessary to memorize the 
positional information of operation data in addition to operation data, the storage capacity which 
an operation takes increases, and the operation time as the whole increases by the read time of 
positional information. 

[0022] This invention can be made in view of the above troubles, can calculate the matrix of big 
size at a high speed, can reduce the storage capacity which an operation takes, can raise cost 
performance, and aims at offer of the parallel operation processor which made the 
miniaturization possible. 


MEANS 


[Means for Solving the Problem] In order to solve an above-mentioned technical problem, the 
parallel operation processor concerning this invention The storage section which memorizes 
operation data, and two or more data-processing sections respectively equipped with the 
operation part which calculates to the operation data memorized by the storage section, The 
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number data of elements which divide data into a field and express the number of the non-zero 
elements which are not 0 in a field, Operation data are formed from the group of the location 
data showing the location in the field of a non-zero element, and the value data showing the 
value of the non-zero element of the location corresponding to a series of location data. Supply 
operation data to each data-processing section, and it has the control section which controls the 
operation of each data-processing section. It is characterized by for each data-processing section 
reading location data and value data based on read-out and the number data of elements, 
calculating the number data of elements memorized in each storage section by operation part, 
and performing the operation to data to juxtaposition in two or more data-processing sections. 
[0024] Moreover, it is characterized by for a control section dividing the data of the letter of a 
matrix for every line, and for the parallel operation processor concerning this invention using the 
number of non-zero elements in one line as the number data of elements, forming operation data 
per line of the data of the letter of a matrix by using the location within one line of a non-zero 
element as location data, and supplying them to each data-processing section. 
[0025] Moreover, the parallel operation processor concerning this invention is characterized by 
equipping the storage section with the serial access memory which performs the continuous 
writing and continuous read-out of data. 

[0026] Moreover, the parallel operation processor concerning this invention is characterized by a 
control section supplying operation data to each data-processing section so that the load of each 
data-processing section may become equal. 


OPERATION 


[Function] A control section divides the data of the letter of a matrix into a field, forms operation 
data from the group of the number data of elements showing the number of the non-zero 
elements which are not 0 in a field, the location data showing the location of a non-zero element, 
and the value data showing the value of the non-zero element of the location corresponding to a 
series of location data, supplies operation data to each data-processing section, and controls the 
operation of each data-processing section by the parallel-operation processor concerning this 
invention. Each data-processing section supplies the operation data supplied from the control 
section to the storage section, and operation data are memorized by the storage section. And each 
data-processing section reads location data and value data based on read-out and the number data 
of elements, and calculates to juxtaposition the number data of elements memorized by the 
storage section. 

[0028] Moreover, with the parallel operation processor concerning this invention, it is a control 
section, Divide the data of the letter of a matrix for every line, and the number of non-zero 
elements in one line is used as the above-mentioned number data of elements. The location 
within the one above-mentioned line of a non-zero element is used as the above-mentioned 
location data, the above-mentioned operation data are formed per line of the data of the letter of a 
matrix from the group of the value data showing the value of the non-zero element of the 
location corresponding to a series of location data, operation data are supplied to the data- 
processing section, and the operation of each data-processing section is controlled. Each data- 
processing section supplies the operation data supplied from the control section to the storage 
section, and operation data are memorized by the storage section. And each data-processing 
section reads location data and value data based on read-out and the number data of elements, 
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and calculates to juxtaposition the number data of elements memorized in the storage section. 
[0029] Moreover, a control section divides data into a field, forms operation data from the group 
of the number data of elements showing the number of the non-zero elements which are not 0 in 
a field, the location data showing the location of a non-zero element, and the value data showing 
the value of the non-zero element of the location corresponding to a series of location data, 
supplies operation data to each data-processing section, and controls the operation of each data- 
processing section by the parallel-operation processor concerning this invention. Each data- 
processing section supplies continuously the operation data supplied from the control section to 
serial access memory, and operation data are memorized by serial access memory. And each 
data-processing section reads the number data of elements memorized to serial access memory, 
reads location data and value data continuously based on the number data of elements, and 
calculates to juxtaposition. 

[0030] Moreover, with the parallel operation processor concerning this invention, it is a control 
section, Data divide into a field, operation data form from the group of the number data of 
elements showing the number of the non-zero elements which are not 0 in a field, the location 
data showing the location of a non-zero element, and the value data showing the value of the 
non-zero element of the location corresponding to a series of location data, operation data supply 
to each data-processing section so that the load of each data-processing section may become 
equal, and the operation of each data-processing section controls. Each data-processing section 
supplies the operation data supplied from the control section to the storage section, and operation 
data are memorized by the storage section. And from the storage section, each data-processing 
section reads location data and value data based on read-out and the number data of elements, 
calculates by operation part, and calculates the number data of elements to juxtaposition. 


EXAMPLE 


[Example] Hereafter, the suitable example of the parallel operation processor concerning this 
invention is explained to a detail, referring to a drawing. 

[0032] This parallel operation processor For example, the m data-processing sections la and lb 
which calculate as shown in drawing 1 and 1 c ******lm, The control section 2 which outputs a 
control signal, operation data, etc. which control these operation control sections la-lm, It has 
the data bus 3 which supplies the operation data from this control section 2 to the above- 
mentioned operation control sections la-lm, and the control bus 4 which supplies the control 
signal from the above-mentioned control section 2 to the above-mentioned operation control 
sections la-lm. 

[0033] Each above-mentioned operation control sections la-lm For example, the storage section 
10 which memorizes the operation data supplied from the above-mentioned control section 2 as 
shown in drawing 2 , The operation part 1 1 which calculates to the data memorized by this 
storage section 10, The buffer 13 which performs the communication link with the above- 
mentioned data bus 3 through an internal data bus 12, the buffer 15 which performs the 
communication link with the above-mentioned control bus 4 through the internal control bus 14, 
and a control signal required for a communication link are generated, and it has the control signal 
generating section 16 supplied to a buffer 15 and operation part 11. Each data-processing 
sections la-lm calculate according to the control signal from a control section 2. 
[0034] And the above-mentioned storage section consists of mass continuation I/O memory 
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(henceforth mass serial access memory (SAM)) 20 which performs the continuous writing and 
continuous read-out of data, and high-speed general-purpose RAM22 of capacity which performs 
the random writing and random read-out of the small capacity SAM 21 with a small capacity, 
and data and which is not so large, as shown in above-mentioned drawing 2 . 
[0035] Moreover, above-mentioned size ** SAM 20 and the small capacity SAM 21 For 
example, the memory cell array 40 to which the writing and read-out of data are performed as 
shown in drawing 3 , The read-out line address counter 41 which generates the read-out address 
of the data to the line of this memory cell array 40, The write-in line address counter 42 which 
generates the write-in address of the data to the line of the above-mentioned memory cell array 
40, The read-out train address counter 43 which generates the read-out address of the data to the 
train of the above-mentioned memory cell array 40, It has the write-in train address counter 44 
which generates the write-in address of the data to the train of the above-mentioned memory cell 
array 40, data input Rhine 45 which inputs input data Din, and data output Rhine 46 which 
outputs output-data Dout. 

[0036] And at the time of the writing of data, the reset light signal RSTW is supplied to the 
above-mentioned write-in line address counter 42 and the above-mentioned write-in address train 
counter 44 with the write enable signal WE, a counter value is reset, the write-in clock WCK 
supplied henceforth is counted, and the address which writes in is generated. Furthermore, with 
the above-mentioned write-in clock WCK, the write-in data Din are continuously supplied from 
data input Rhine 45, and are memorized by the memory cell 40. 

[0037] Moreover, at the time of read-out of data, the reset lead signal RSTR is supplied to the 
above-mentioned read-out line address counter 41 and the above-mentioned read-out address 
train counter 43 with the lead enable signal RE, a counter value is reset, the read-out clock RCK 
supplied henceforth is counted, and the address which performs read-out is generated. 
Furthermore, with the above-mentioned read-out clock RCK, the read-out data Din are 
continuously read from a memory cell 40, and it is outputted to data output Rhine 46. 
[0038] Next the operation which asks this parallel operation processor actuation for the product 
of a matrix and a vector is made into an example, and it explains. Generally, the product Y of the 
matrix A as shown in the following formulas 3, and Vector X is the element yi of the product 
vector Y by the operation as shown in the following formula 4. It asks by asking. 
[0039] 
[Equation 3] 


[0040] 
[Equation 4] 


[0041] Here, as Matrix A shows the operation of the above-mentioned formula 3 in the following 
formulas 5, when the element of the great portion of matrix is the so-called sparse matrix which 
is 0, it can ask for the product Y of Matrix A and Vector X by the operation as shown in the 
following formula 6.1 - a formula 6.8. 
[0042] 
[Equation 5] 
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[0043] 
[Equation 6] 


[0044] So, with this parallel operation equipment, a control section 2 divides Matrix A into each 
line, forms operation data from the group of the number data of elements showing the number of 
the non-zero elements in one line, the location data (row number) about all non-zero elements, 
and value data (value of an element), and supplies this operation data to each data-processing 
sections la-lm. In the case of the above-mentioned formula 5, since the non-example elements 
of the 1st line are al 1 and al2, the number data of elements are set to "2", and, specifically, 
location data are set to "1" and "2", respectively, for example. Moreover, since the non-example 
elements of the 2nd line are a21, a22, and a24, the number data of elements are set to "3", 
location data are set to "1", "2", and "4 M , respectively, and value data are set to f, a21", M a22", and 
M a24", respectively, for example. 

[0045] Moreover, a control section 2 compares the number of the non-zero elements of each line, 
and it assigns each data-processing sections [la-lm ] operation data so that an each data- 
processing sections [la-lm ] load may become equal. Allocation of concrete operation data 
memorizes the number of the non-zero elements of each line of for example, the matrix A, and is 
performed by choosing the line assigned to each data-processing sections la-lm so that the 
number of these non-zero elements may become equal. For example, in the case of the above- 
mentioned formula 5, since the number of the non-zero elements from the 1st line of Matrix A to 
the 8th line is 2, 3, 1, 2, 2, 1, 3, and 2, respectively, as shown, for example in drawing 4 In the 
case where it has the four data-processing sections la, lb, lc, and Id Data of the 1st line and the 
4th line are supplied to data-processing section la, data of the 2nd line and the 3rd line are 
supplied to data-processing section lb, data of the 5th line and the 8th line are supplied to data- 
processing section lc, and data of the 6th line and the 7th line are supplied to Id of data- 
processing sections. 

[0046] Consequently, the number of the non-zero elements which the operation data for two 
lines are respectively supplied to each data-processing sections la- Id, and are supplied to each 
data-processing sections la- Id is four pieces respectively, and the each data-processing sections 
[la-Id ] load (operation) is equal. 

[0047] And the each processing units [la-Id ] operation part 1 1 calculates by reading the above- 
mentioned operation data (the number data of elements, location data, value data) memorized to 
large capacity SAM 20 according to the control signal from a control section 2. 
[0048] the next - each data-processing section 1 - the case where the product of the matrix A 
which shows actuation of the actual operation data in a- Id in the above-mentioned formula 3, 
and Vector X is calculated is explained to an example using the flow chart shown in 5 Fig. First, 
as operation data are supplied to each data-processing sections la- Id in advance of the operation, 
for example, it is shown in drawing 4 , the operation data for every line of Matrix A are 
memorized by large capacity SAM 20, and the data of Vector X are memorized by high-speed 
general-purpose RAM22. Specifically to the large capacity SAM 20 of data-processing section 
la The number data bl of elements showing the number of the non-zero elements of the 1st line 
of Matrix A (2), The number data b2 of elements which 2 sets of location data cp (1 p= 2) and 
the value data dp (1 1 1 a 2a 12) are memorized continuously, then express the number of the 
non-zero elements of the 4th line of Matrix A (2), 2 sets of location data cp (2 p= 4) and the 
value data dp (2 42 a 4a 44) are memorized. Furthermore, the data xq (q= 1, 2.. .8) of Vector X 
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are memorized by high-speed general-purpose RAM22. Moreover, although not illustrated, the 
data in which the line count (the line count in their duty) and line number (i) which were 
assigned to each data-processing sections la- Id with operation data are shown are supplied to 
each data-processing sections la- Id. 

[0049] And the operation data memorized by large capacity SAM 20 as mentioned above are 
continuously read in the sequence memorized at the time of an operation. For example, at above- 
mentioned data-processing section la, it is the number data bl of elements of the 1st line of 
Matrix A first. (2) is read. Then, 2 sets of location data cp The value data dp (1 1 1 a 2a 12) are 
read continuously, and it is the number data d2 of elements of the 4th line of Matrix A further. 
(2) and 2 sets of location data cp The value data dp (2 42 a 4a 44) are read. 
[0050] And a control section 2 supplies the control signal which starts an operation to each data- 
processing sections la- Id through a control bus 4 and the internal control bus 14, and progresses 
to step SI. And in step SI, operation part 1 1 sets to 1 the count variable k which counts a 
processing line count, and progresses to step S2. 

[0051] In step S2, operation part 1 1 compares the value and the line count in its duty of the count 
variable k, if the count variable k is below the line count in its duty, it will progress to step S3, 
and if the count variable k is size from the line count in its duty, since processing for the line 
count in its duty is already completed, it will be ended. 

[0052] In step S3, operation part 1 1 reads from large capacity SAM 20, the several m b, i.e., 
above-mentioned number data of elements, of a non-zero element, and progresses to step S4. 
[0053] In step S4, the value of the count variable L is set to 1, it sets the value of Variable yi to 0, 
and operation part 1 1 progresses to step S5. 

[0054] In step S5, operation part 1 1 measures several m of the value of the count variable L, and 
a non-zero element, if the value of the count variable L is several m or less of a non-zero 
element, it will progress to step S6, and if the value of the count variable L is size from several m 
of a non-zero element, since processing for one line is already completed, it will progress to step 
11. 

[0055] It sets to step S6 and operation part 1 1 is the row number cp of a non-zero element (j), 
i.e., above-mentioned location data, from large capacity SAM 20. It reads and progresses to step 
S7. 

[0056] Setting to step S7, operation part 1 1 is, the value (aij) dp, i.e., the above-mentioned value 
data, of an element of the matrix from large capacity SAM 20. It reads and progresses to step S8. 
[0057] It is the value xj of the element of the vector X on step S8 and corresponding to the row 
number (j) of high-speed general-purpose RAM22 to the above-mentioned non-zero element in 
operation part 1 1 . It reads and progresses to step S9. 

[0058] Setting to step S9, operation part 1 1 is Variable yi. The value (aij) of the element of a 
matrix, and value xj of the element of Vector X A product is added and it progresses to step 10. 
[0059] In step S10, operation part 1 1 adds 1 to the value of the count variable L, and returns to 
step S5. That is, processing from the above-mentioned step S5 to step S10 is repeated about the 
data for one line, and they are the value (aij) of the element for one line, and the value xj of the 
element of Vector X. It is Variable yi about a product. It will add. 

[0060] After processing for one line is completed, the value of the count variable L serves as size 
from several m of a non-zero element, and it progresses to step SI 1 from step S5, and sets to step 
SI 1. And operation part 1 1 Variable yi called for as mentioned above Value yi, i.e., the value of 
the element of the product vector Y, In step S12 which writes in and follows high-speed general- 
purpose RAM22, 1 is added to the value of the count variable k, and it returns to step S2. That is, 
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processing from step S2 to step S12 is repeated until processing for the line count in its duty is 
completed. 

[0061] As mentioned above, each data-processing sections la- Id are the values yi of the 
assigned element of the product vector Y according to the above-mentioned formula 6.1 - a 
formula 6.8. As it calculates, for example, is shown in drawing 6 , it memorizes to high-speed 
general-purpose RAM22. And value yi of the element of this product vector Y It is read by the 
control signal from a control section 2, and a control section 2 is the value yi of the element of 
the product vector Y from each data-processing sections la- Id. An operation is ended unitedly. 
[0062] Since this invention was applied to the parallel operation processor which performs the 
operation which asks for the product of a matrix and a vector, while a value cannot perform the 
operation to the zero element which is 0 but can accelerate an operation, in this parallel operation 
processor, the storage capacity of the storage section which an operation takes can be reduced, so 
that clearly from the above explanation. Moreover, since operation data were assigned to each 
data-processing section so that the load of each data-processing section might become equal as 
mentioned above, operation effectiveness can be raised and the whole operation time can be 
shortened. 

[0063] In addition, the technical thought of this invention is not limited to an above-mentioned 
example, and can also apply the operation to process also to the operation which asks for an 
above-mentioned matrix, not only the product of a vector but a matrix, and the product of 
matrices. Moreover, the field which divides data can also perform the operation which asks for a 
matrix and the product of matrices at a high speed by dividing one matrix for every line, dividing 
the matrix of another side for every train, and forming operation data by the operation which 
asks not only for the line unit of an above-mentioned matrix but for an above-mentioned matrix 
and the above-mentioned product of matrices. Moreover, it is clear that it is applicable also to the 
operation to the data with which the data which serve as a candidate for an operation, for 
example are not restricted to an above-mentioned matrix, either, and a single string continues. 


DESCRIPTION OF DRAWINGS 


[Brief Description of the Drawings] 

[Drawing 11 It is the block diagram showing the configuration of the parallel operation processor 
which applied this invention. 

[Drawing 21 It is the block diagram showing the concrete configuration of the data-processing 

section which constitutes the above-mentioned parallel operation processor. 

[Drawing 3] It is the block diagram showing the concrete configuration of SAM which 

constitutes the storage section of the above-mentioned data-processing section, 

[Drawing 4] It is drawing for explaining actuation of the above-mentioned parallel operation 

processor. 

[Drawing 51 It is a flow chart for explaining actuation of the above-mentioned train processing 
unit. 

[Drawing 61 It is drawing for explaining actuation of the above-mentioned parallel operation 
processor. 

[Drawing 71 It is the block diagram showing the configuration of the conventional processing 
unit. 

[Drawing 81 It is the block diagram showing the configuration of the conventional processing 
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unit. 

[Drawing 9] It is the block diagram showing the configuration of the conventional parallel 
operation processor. 

[Drawing 101 It is the block diagram showing the concrete configuration of the data-processing 

section which constitutes the conventional parallel operation processor. 

[Drawing 111 It is drawing for explaining actuation of the conventional parallel operation 

processor, 

1 Data-processing section 

2 Control section 

3 Data bus 

4 Control bus 

10 Storage section 

11 Operation part 

12 Internal data bus 

13 15. ..Buffer 

14 Internal control bus 

16 Control signal generating section 

20 Large capacity SAM 

21 Small capacity SAM 

22 High-speed general-purpose RAM 

b The number data of elements 

c Location data 

d Value data 

xj Vector data 

yi Vector 
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